Clustering of Web Usage Data Using Fuzzy Tolerance Rough Set Similarity and Table Filling Algorithm
نویسنده
چکیده
Web Usage Mining is the application of data mining techniques to learn usage patterns from Web server log file in order to understand and better serve the requirements of web based applications. Web Usage Mining includes three most important steps namely Data Preprocessing, Pattern discovery and Analysis of the discovered patterns. One of the most important tasks in Web usage mining is to find groups of users exhibiting similar browsing patterns. Grouping web transactions into clusters is important in order to understand user’s navigational behavior. Different types of clustering algorithms such as partition based, distance based, density based, grid based, hierarchical and fuzzy clustering algorithms are used to find clusters from Web usage data. In this paper we propose an approach for clustering Web usage data based on Fuzzy tolerance rough set theory and table filling algorithm. First, we have constructed the sessions using concept hierarchy and link information. The similarity between two sessions is approximated by using Rough set tolerance relation. The tolerance relation is reformulated into equivalence relation using fuzzy tolerance. Then the clusters are obtained by using modified table filling algorithm. We provide experimental results of Fuzzy rough set similarity and table filling algorithm on MSNBC web navigation data set. In this paper, we have considered the server log files of the Website www.enggresources.com for overall study and analysis.
منابع مشابه
Interval set clustering of web users using modified Kohonen self-organizing maps based on the properties of rough sets
Web usage mining involves application of data mining techniques to discover usage patterns from the web data. Clustering is one of the important functions in web usage mining. The likelihood of bad or incomplete web usage data is higher than the conventional applications. The clusters and associations in web usage mining do not necessarily have crisp boundaries. Researchers have studied the pos...
متن کاملAn Efficient Agglomerative Clustering Algorithm for Web Navigation Pattern Identification
Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketing, for web personalization, to predict web access sequence etc. In this paper, a new agglomerative clustering technique is proposed to identify users with similar interest, and to ...
متن کاملRough set based User profiling for Web Personalization
Web usage mining has recently emerged as a basis for extracting useful user access pattern information, such as user profiles, from enormous amounts of Web log data for web site personalization. A profile can consist of a set of URLs that are relevant to the sessions assigned to a given cluster. Once these profiles are discovered, they can be exploited as part of an automated personalization on...
متن کاملNeighborhood Clustering of Web Users With Rough K-Means
Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuz...
متن کاملFuzzy set and rough set based evaluation algorithm of web customers
A fuzzy algorithm of web customers evaluation based on rough set is presented. Key attributes can be gotten through rough set. The evaluation from the data objects based on key attributes can reduce the data size and algorithm complexity. After Clustering analysis of customers, then the evaluation analysis will process to the clustering data. There are a lot of uncertain data in customer cluste...
متن کامل